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Abstract 

The analysis of massive graphs is now becoming a very 
important part of science and industrial research. This 
has led to the construction of a large variety of graph 
models, each with their own advantages. The Stochastic 
Kronecker Graph (SKG) model has been chosen by the 
Graph500 steering committee to create supercomputer 
benchmarks for graph algorithms. The major reasons 
for this are its easy parallelization and ability to mirror 
real data. Although SKG is easy to implement, there 
is little understanding of the properties and behavior of 
this model. 

We show that the parallel variant of the edge- 
configuration model given by Chung and Lu (referred 
to as CL) is notably similar to the SKG model. The 
graph properties of an SKG are extremely close to 
those of a CL graph generated with the appropriate 
parameters. Indeed, the final probability matrix used 
by SKG is almost identical to that of a CL model. This 
implies that the graph distribution represented by SKG 
is almost the same as that given by a CL model. We 
also show that when it comes to fitting real data, CL 
performs as well as SKG based on empirical studies 
of graph properties. CL has the added benefit of a 
trivially simple fitting procedure and exactly matching 
the degree distribution. Our results suggest that users 
of the SKG model should consider the CL model because 
of its similar properties, simpler structure, and ability 
to fit a wider range of degree distributions. At the very 
least, CL is a good control model to compare against. 
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1 Introduction 

With more and more data being represented as large 
graphs, network analysis is becoming a major topic 
of scientific research. Data that come from social 
networks, the Web, patent citation networks, and power 
grid structures are increasingly being viewed as massive 
graphs. These graphs usually have peculiar properties 
that distinguish them from standard random graphs 
(like those generated from the Erdos-Renyi model). 
Although we have a lot of evidence for these properties, 
we do not have a thorough understanding of why these 
properties occur. Furthermore, it is not at all clear 
how to generate synthetic graphs that have a similar 
behavior. 

Hence, graph modeling is a very important topic 
of study. There may be some disagreement as to the 
characteristics of a good model, but the survey [1] gives 
a fairly comprehensive list of desired properties. As 
we deal with larger and larger graphs, the efficiency 
and speed as well as implementation details become 
deciding factors in the usefulness of a model. The 
theoretical benefit of having a good, fast model is quite 
clear. However, the benefits of having good models go 
beyond an ability to generate large graphs, since such 
models provide insight into structural properties and 
the processes that generate large graphs. 

The Stochastic Kronecker graph (SKG) [2, 3], a gen- 
eralization of recursive matrix (R-MAT) model [4], is a 
model for large graphs that has received a lot of atten- 
tion. It involves few parameters and has an embarrass- 
ingly parallel implementation (so each edge of the graph 
can be independently generated). The importance of 
this model cannot be understated — it has been cho- 
sen to create graphs for the Graph500 supercomputer 
benchmark [5]. Moreover, many researchers generate 
SKGs for testing their algorithms [6, 7, 8, 9, 10, 11, 12, 
13, 14, 15]. 

Despite the role of this model in graph benchmark- 
ing and algorithm testing, precious little is truly known 
about its properties. The model description is quite sim- 
ple, but varying the parameters of the model can have 
quite drastic effects on the properties of the graphs be- 
ing generated. Understanding what goes on while gen- 
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erating an SKG is extremely difficult. Indeed, merely 
explaining the structure of the degree distribution re- 
quires a significant amount of mathematical effort. 

Could there be a conceptually simpler model that 
has properties similar to SKG? A possible candidate is a 
simple variant of the Erdos-Renyi model first discussed 
by Aiello, Chung, and Lu [16] and generalized by Chung 
and Lu [17, 18]. The Erdos-Renyi model is arguably the 
earliest and simplest random graph model [19, 20]. The 
Chung-Lu model (referred to as CL) can be viewed as 
a version of the edge configuration model or a weighted 
Erdos-Renyi graph. Given any degree distribution, it 
generates a random graph with the same distribution 
on expectation. (The version by Aiello et al. only 
considered power law distributions.) It is very efficient 
and conceptually very simple. Amazingly, it has been 
overlooked as a model to generate synethetic instances, 
and is not even considered as a "control model" to 
compare with. (This is probably because of the strong 
ties to a standard Erdos-Renyi graph, which is well 
known to be unsuitable for modeling social networks.) 
A major benefit of this model is that it can provide 
graphs with any desired degree distribution (especially 
power law), something that SKG provably cannot do. 

Our aim is to provide a detailed comparison of the 
SKG and CL models. We first compare the graph prop- 
erties of an SKG graph with an associated CL graph. 
We then look at how these models fit real data. Our ob- 
servations show a great deal of similarity between these 
models. To explain this, we look directly at the proba- 
bility matrices used by these models. This gives insight 
into the structure of the graphs generated. We notice 
that the SKG and CL matrices have much in common 
and give evidence that the differences between these are 
only (slightly) quantitative, not qualitative. We also 
show that for some settings of the SKG parameters, the 
SKG and associated CL models coincide exactly. 

1.1 Notation and Background 

1.1.1 Stochastic Kronecker Graph (SKG) 
model The model takes as input the number of nodes 
n (always a power of 2), number of edges m, and a 
2x2 generator matrix T. We define £ = log 2 n as the 
number of levels. In theory, the SKG generating matrix 
can be larger than 2x2, but we are unaware of any 
such examples in practice. Thus, we assume that the 
generating matrix has the form 



Each edge is inserted according to the probabilities 1 
defined by 

P = T <g> T (g) • • • <g> T . 

v v ✓ 

i times 

We will refer to Pskg as the SKG matrix associated 
with these parameters. Observe that the entries in Pskg 
sum up to 1, and hence it gives a probability distribution 
over all pairs This is the probability that a single 

edge insertion results in the edge By repeatedly 

using this distribution to generate m edges, we obtain 
our final graph. 

In practice, the matrix Pskg is never formed explic- 
itly. Instead, each edge is inserted as follows. Divide the 
adjacency matrix into four quadrants, and choose one 
of them with the corresponding probability £1,^2, £3, or 
£4. Once a quadrant is chosen, repeat this recursively 
in that quadrant. Each time we iterate, we end up in a 
square submatrix whose dimensions are exactly halved. 
After £ iterations, we reach a single cell of the adjacency 
matrix, and an edge is inserted. This is independently 
repeated m times to generate the final graph. Note that 
all edges can be inserted in parallel. This is one of the 
major advantages of the SKG model and why it is appro- 
priate for generating large supercomputer benchmarks. 

A noisy version of SKG (called NSKG) has been re- 
cently designed in [21, 22]. This chooses the probability 
matrix 

p = T 1 ®'--®T £ , 

where each Ti is a specific random perturbation of the 
original generator matrix T. This has been provably 
shown to smooth the degree distribution to a lognormal 
form. 

1.2 Chung-Lu (CL) model This model can be 
thought of as a variant of the edge configuration model. 
Let us deal with directed graphs to describe the CL 
model. Suppose we are given sequences of n in-degrees 
d\ , &2 , . • • , d n , and n out-degrees d[ , d! 2 , . . . , d! n . We have 
di — d'i — m- Consider the probability matrix 
Pcl where the (i, j) entry is didj/m 2 . (The sum of all 
entries in Pcl is 1.) We use this probability matrix to 
make m edge insertions. 

This is slightly different from the standard CL 
model, where an independent coin flip is done for every 
edge. This is done by using the matrix tuPcl (similar 
to SKG). In practice, we do not generate Pcl explicitly, 
but have a simple 0(m) implementation analogous 
to that for SKG. Independently for every edge, we 



h t 2 

ts £4 



with £1 + £2 + £3 + £4 = 1- 



1 We have taken a slight liberty in requiring the entries of T to sum 
to 1. In fact, the SKG model as defined in [3] works with the matrix 
mP, which is considered the matrix of probabilities for the existence 
of each individual edge (though it might be more accurate to think of 
it as an expected value). 
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choose a source and a sink. Both of these are chosen 
independently using the degree sequences as probability 
distributions. This is extremely simple to implement 
and it is very efficient. 

We will focus on undirected graphs for the rest of 
this paper. This is done by performing m edge inser- 
tions, and considering each of these to be undirected. 
For real data that is directed, we symmetrize by remov- 
ing the direction. 

Given a set of SKG parameters, we can define the 
associated CL model. Any set of SKG parameters 
immediately defines an expected degree sequence. In 
other words, given the SKG matrix Pskg? we can deduce 
the expected in- (and out)-degrees of the vertices. For 
this degree sequence, we can define a CL model. We 
refer to this as the associated CL model for a given set 
of SKG parameters. This CL model will be used to 
define a probability matrix Pcl- In this paper, we will 
study the relations between Pskg an d Pcl- Whenever 
we use the term Pcl, this will always be the associated 
CL model of some SKG matrix Pskg- 

1.3 Our Contributions The main message of this 
work can be stated simply. The SKG model is close 
enough to its associated CL model that most users of 
SKG could just as well use the CL model for generating 
graphs. These models have very similar properties both 
in terms of ease of use and in terms of the graphs they 
generate. Moreover, they both reflect real data to the 
same extent. The general CL model has the major 
advantage of generating any desired degree distribution. 

We stress that we do not claim that the CL model 
accurately represents real graphs, or is even the "right" 
model to think about. But we feel that it is a good 
control model, and it is one that any other model should 
be compared against. Fitting CL to a given graph is 
quite trivial; simply feed the degree distribution of the 
real graph to the CL model. Our results suggest that 
users of SKG can satisfy most of their needs with a CL 
model. 

We provide evidence for this in three different ways. 

1. Graph properties of SKG vs CL: We construct an 
SKG using known parameter choices from the Graph500 
specification. We then generate CL graphs with the 
same degree distributions. The comparison of graph 
properties is very telling. The degree distribution are 
naturally very similar. What is surprising is that the 
clustering coefficients, eigenvalues, and core decomposi- 
tions match exceedingly well. Note that the CL model 
can be thought of as a uniform random samples of 
graphs with an input degree distribution. It appears 
that SKG is very similar, where the degree distribution 
is given implicitly by the generator matrix T. 



2. Quantitative comparison of generating matrices 
Pskg and Pcl' We propose an explanation of these 
observations based on comparisons of the probability 
matrices of SKG and CL. We plot the entries of these 
matrices in various ways, and arrive at the conclusion 
that these matrices are extremely similar. More con- 
cretely, they represent almost the same distribution on 
graphs, and differences are very slight. This strongly 
suggests that the CL model is a good and simple ap- 
proximation of SKG, and it has the additional benefit 
of modeling any degree distribution. We prove that un- 
der a simple condition on the matrix T, Pskg is identical 
to Pcl- Although this condition is often not satisfied by 
common SKG parameters, it gives strong mathematical 
intuition behind the similarities. 

3. Comparing SKG and CL to real data: The pop- 
ularity of SKG is significantly due to fitting procedures 
that compute SKG parameters corresponding to real 
graphs [3]. This is based on an expensive likelihood 
optimization procedure. Contrast this with CL, which 
has a trivial fitting mechanism. We show that both 
these models do a similar job of matching graph param- 
eters. Indeed, CL guarantees to fit the degree distri- 
bution (up to expectations). In other graph properties, 
neither SKG nor CL is clearly better. This is a very 
compelling reason to consider the CL model as a con- 
trol model. 

In this paper, we focus primarily on SKG instead of 
the noisy version NSKG because SKG is extremely well 
established and used by a large number of researchers 
[6, 7, 8, 9, 10, 15, 11, 12, 13, 14]. Nonetheless, all 
our experiments and comparisons are also performed 
with NSKG. Other than correcting deficiencies in the 
degree distribution, the effect of noise on other graph 
properties seems fairly small. Hence, for our matrix 
studies and mathematical theorems (§4 and §5), we 
focus on similarities between SKG and CL. We however 
note that all our empirical evidence holds for NSKG as 
well: CL seems to model NSKG graphs reasonably well 
(though not as perfectly as SKG), and CL fits real data 
as well as NSKG. 

1.4 Parameters for empirical study We focus 
attention on the Graph500 benchmark [5]. This is 
primarily for concreteness and the relative importance 
of this parameter setting. Our results hold for all the 
settings of parameters that we experimented with. For 
NSKG, there is an additional noise parameter required. 
We set this to 0.1, the setting studied in [21]. 

• Graph500: T = [0.57, 0.19; 0.19, 0.05], t e {26, 
29, 32, 36, 39, 42}, and m = 16 -2 £ . We focus on a much 
smaller setting, t = 18. 
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Figure 1: Comparison of the graph properties of SKG generated with Graph500 parameters and an equivalent 
CL. 



2 Previous Work 

The SKG model was proposed by Leskovec et al. [23], 
as a generalization of the R-MAT model, given by 
Chakrabarti et al. [4]. Algorithms to fit SKG to 
real data were given by Leskovec and Faloutsos [2] 
(extended in [3]). This model has been chosen for 
the Graph500 benchmark [5]. Kim and Leskovec [24] 
defined a variant of SKG called the Multiplicative 
Attribute Graph (MAG) model. 

There have been various analyses of the SKG model. 
The original paper [3] provides some basic theorems 
and empirically shows a variety of properties. Groer 
et al. [25], Mahdian and Xu [26], and Seshadhri et al. 
[21] study how the model parameters affect the graph 
properties. It has been conclusively shown that SKG 
cannot generate power-law distributions [21]. Seshadhri 
et al. also proposed noisy SKG (NSKG), which can 
provably produce lognormal degree distributions. 

Sala et al. [27] perform an extensive empirical study 
of properties of graph models, including SKGs. Miller 
et al. [28] give algorithms to detect anomalies embedded 



in an SKG. Moreno et al. [29] study the distributional 
properties of families of SKGs. 

A good survey of the edge-configuration model and 
its variants is given by Newman [30] (refer to Section 
IV. B). The specific model of CL was first given by 
Chung and Lu [17, 18]. They proved many properties 
of these graphs. Properties of its eigenvalues were given 
by Mihail and Papadimitriou [31] and Chung et al. [32]. 

3 Similarity between SKG and CL 

Our first experiment details the similarities between an 
SKG and its equivalent CL. We construct an SKG using 
the Graph500 parameters with £ = 18. We take the 
degree distribution of this graph, and construct a CL 
graph using this. Various properties of these graphs are 
given in Fig. 1. We give details below: 

1. Degree distribution (Fig. la): This is the stan- 
dard degree distribution plot in log- log scale. It is no 
surprise that the degree distributions are almost identi- 
cal. After all, the weighting of CL is done precisely to 
match this. 
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Figure 2: The figures compares the graph properties of NSKG generated with Graph500 parameters and an 
equivalent CL. 



2. Clustering coefficients (Fig. lb): The clustering 
coefficient of a vertex i is the fraction of wedges centered 
at i that participate in triangles. We plot d versus 
the average clustering coefficient of a degree d vertex 
in log-log scale. Observe the close similarity. Indeed, 
we measure the difference between clustering coefficient 
values at d to be at most 0.04 (a lower order term with 
respect to commonly measured values in real graphs 
[33]). 

3. Eigenvalues (Fig. lc): Here, we plot the first 25 
eigenvalues (in absolute value) of the adjacency matrix 
of the graph in log-scale. The proximity of eigenvalues 
is very striking. This is a strong suggestion that graph 
structure of the SKG and CL graphs are very similar. 

4. Assortativity (Fig. Id): This is non-standard 
measure, but we feel that it provides a lot of structural 
intuition. Social networks are often seen to be assorta- 
tive [34, 35], which means that vertex of similar degree 
tend to be connected by edges. For define X& to be 
the average degree of an average degree d vertex. We 
plot d versus in log-log scale. Note that neither SKG 



nor CL are particularly assortative, and the plots match 
rather well. 

5. Core decompositions (Fig. le): The /c-cores of a 
graph are a very important part of understanding the 
community structure of a graph. The size of the /c-core 
is the largest induced subgraph where each vertex has 
a minimum degree of k. This is a subset S of vertices 
such that all vertices have k neighbors in S. These 
sizes can be quickly determined by performing a core 
decomposition. This is obtained by iteratively deleting 
the minimum degree vertex of the graph. The core plots 
look amazingly close, and the only difference is that 
there are slightly larger cores in CL. 

All these plots clearly suggest that the Graph500 
SKG and its equivalent CL graph are incredibly close 
in their graph properties. Indeed, it appears that most 
important structural properties (especially from a social 
networks perspective) are closely related. We will show 
in §6 that CL performs an adequate job of fitting real 
data, and is quite comparable to SKG. We feel that 
any uses of SKG for benchmarking or test instances 
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generation could probably be done with CL graphs as 
well. 

For completeness, we plot the same comparisons 
between NSKG and CL in Fig. 2. We note again that 
the properties are very similar, though NSKG shows 
more variance in its values. Clustering coefficient values 
differ by at most 0.02 here, and barring small differences 
in initial eigenvalues, there is a very close match. The 
assort ativity plots show more oscillations for NSKG, but 
CL gets the overall trajectory. 

4 Connection between SKG and CL matrices 

Is there a principled explanation for the similarity 
observed in Fig. 1? It appears to be much more than 
a coincidence, considering the wide variety of graph 
properties that match. In this section, we provide an 
explanation based on the similarity of the probability 
matrices Pskg and Pcl- On analyzing these matrices, 
we see that they have an extremely close distribution of 
values. These matrices are themselves so fundamentally 
similar, providing more evidence that SKG itself can be 
modeled as CL. 

We begin by giving precise formulae for the entries 
of the SKG and CL matrices. This is by no means new 
(or even difficult), but it should introduce the reader 
to the structure of these matrices. The vertices of the 
graph are labeled from [n] (the set of positive integers 
up to n). For any z, denotes the binary representation 
of i as an £-bit vector. For two vectors and Vj, the 
number of common zeroes is the number of positions 
where both vectors are 0. The following formula for 
the SKG entries has already been used in [4, 23, 25]. 
Observe that these entries (for both SKG and CL) are 
quite easy to compute and enumerate. 

Claim 4.1. Let i,j E [n]. Let the number of zeroes in 
V{ and Vj be Z{ and Zj respectively. Let the number of 
common zeroes be c z . Then 

PsKG(i,j) = trt z r c % j ~ c ^l~ Zi ~ Zj+Cz , 

Pc L {i, j) = (h + t 2 y> (t 3 + uf-^ (h + t 3 y> (t 2 + u) 1 -*' . 

Proof. The number of positions where Vi is zero but Vj 
is one is Z{ — c z . Analogously, the number of positions 
where only Vj is zero is Zj — c z . The number of common 
ones is t — Z{ — Zj + c z . Hence, the (i, j) entry of the 

p : j.c z .Zi—c z j.Zj—Cz.£—Zi—Zj+c z 
^ SKG IS t 1 t 2 t 3 t 4 

Let us now compute the entry in the corre- 

sponding CL matrix. The probability that a single edge 
insertion becomes an out-edge of vertex i in SKG is 
(h + t2) Zi (H + t±Y~ Zi . Hence, the expected out-degree 
of i is m(ti + t 2 ) Zi (ts + t±f~ Zi . Similarly, the expected 
in-degree of j is m(t\ -\-ts) Zj (t 2 -\-t£f~ Zi - The (i, j) entry 
of P CL is (h + t 2 ) Zi {h + UY~ Zi (h + t s ) z * (t 2 + 1 4 )^ .□ 



The inspiration for this section comes from Fig. 3. 
Our initial aim was to understand the SKG ma- 
trix, and see whether the structure of the values 
provides insight into the properties of SKG. Since 
each entry in this probability matrix is of the form 
j.c Z j_Zi-c Z j_Zj c z ^ z % Zj+c z ^kere are many repeated val- 
ues in this matrix. For each value in this probability 
matrix, we simply plot the number of times (the multi- 
plicity) this value appears in the matrix. (For Pskg? this 
is given in red) This is done for the associated Pcl in 
blue. Note the uncanny similarity of the overall shapes 
for SKG and CL. Clearly, Pskg has more distinct val- 
ues 2 , but they are distributed fairly similarly to Pcl- 
Nonetheless, this picture is not very formally convinc- 
ing, since it only shows the overall behavior of the dis- 
tribution of values. 




Value of entry 



Figure 3: Distribution of entries of Pskg and Pcl- 

Fig. 4 makes a more faithful comparison between 
the Pskg and Pcl matrices. As we note from Fig. 3, Pcl 
has a much smaller set of distinct entries. Suppose the 
distinct values in Pcl are v\ > v 2 > vs . . Associate a 
bin with each distinct entry of Pcl- For each entry of 
^skg? place it in the bin corresponding to the entry of 
Pcl with the closest value. So, if some entry in Pskg 
has value v, we determine the index i such that \v — 
is minimized. This entry is placed in the ith bin. We 
can now look at the size of each bin for Pskg- The size 
of the zth bin for the Pcl is simply the multiplicity of 
Vi in Pcl- 

Observe how these sizes are practically identical for 
large enough entry value. Indeed the former portion of 



This can be proven by inspecting Claim 4.1. 
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these plots, for value < 10 -20 , only accounts for a total 
of < 10 -5 of the probability mass. This means that the 
fraction of edges that will correspond to these entries 
is at most 10 -5 . We can also argue that these entries 
correspond only to edges joining very low degree vertices 
to each other. In other words, the portion where these 
curves differ is really immaterial to the structure of the 
final graph generated. 
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the CL model. 



Theorem 5.1. Consider an SKG model where T satis- 
fies the following: 



This is very strong evidence that SKG behaves like a 
CL model. The structure of the matrices are extremely 
similar to each other. Fig. 5 is even more convincing. 
Now, instead of just looking at the size of each bin, we 
look at the total probability mass of each bin. For Pskg 
matrix, this is the sum of entries in a particular bin. For 
Pcl, this is the product of the size of the bin and the 
value (which is the again just the sum of entries in that 
bin). Again, we note the almost exact coincidence of 
these plots in the regime where the probabilities matter. 
Not only are the number of entries in each bin (roughly) 
the same, so is the total probability mass in the bin. 

We now generate a random sample from Pskg and 
one from Pcl- Fig. 6 shows MATLAB spy plots of the 
corresponding graphs (represented by their adjacency 
matrices). One of the motivations for the SKG model 
was that it had a fractal or self-similar structure. It ap- 
pears that the CL graph shares the same self-similarity. 
Furthermore, this self-repetition looks identical for the 
both SKG and CL graphs. 



ti 

t 2 



Then P S kg = Pcl- 



Proof. Let a - 

t\ = a 2 pt^ ts 
h+t 2 



l A] 
ts 



ti/t2 = £3/^4, and let ts = fit 2 . Then, 
= ctfit 2 \ and t 2 = oct 4. Note that since 

*4 = 1, 



(5.1) 



(a 2 (3- 



a(3 + l)U = 1 



We use the formula given in Claim 4.1 for the (z, j) 
entry of the SKG and CL matrices. 

By simple substitution, the entry for SKG is 



5 Mathematical justifications 

We prove that when the entries of the matrix T satisfy 

the condition £1/^2 = £3/^4? then SKG is identical to (5.2): 



(Ua 2 f3) Cz (Ua) z 
t{a z ^ z >f3 z > 



>(Uaf3) z 



~Zj+C z 
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Figure 6: Spy plots of SKG drawn from Pskg an d CL graph drawn from Pql 



Analogously, for the CL matrix, the entry has value 

(h + t 2 ) Zi (t 3 + uy- z * (h + t 3 y* (t 2 + u) 1 -** 

= [t 4 (a 2 + a)] Zi [U(aP + l)] l - Zi x 
[U(o?P + aP)] z *[a(a + l)] l - z ' 

= tf(a 2 ^a) z ^a(3^lY- z ^a 2 (3^af3) z ^a^lY-^ 

= tfa Zi+z i/3 z i(a/3 + l) Zi (a/3 + iy- Zi x 
(a + l) z '(a + l) l - z * 

= 1%a Zi+z 'P z '(a 2 p + a + aP + l) £ 

= t{a Zi+z *p z i [U(a 2 f3 + a + af3 + 1)]* 

The last part follows from (5.1). This is exactly the 
same as (5.2). □ 

6 Fitting SKG vs CL 

Fitting procedures for SKG model have been given in 
[3]. This is often cited as a reason for the popularity of 
SKG. These fits are based on algorithms for maximizing 
likelihood, but can take a significant amount of time to 
run. The CL model is fit by simply taking the degree 
distribution of the original graph. Note that the CL 
model uses a lot more parameters than SKG, which 
only requires 5 independent numbers. In that sense, 
SKG is a very appealing model regardless of any other 
deficiencies. 

We show comparisons of the CL, SKG, and NSKG 
models with respect to three different real graphs. For 
directed graphs, we look at the undirected version where 



directions are removed from all the edges. The real 
graphs are the following: 

• soc-Epinions: This is a social network from the 
Epinions website, which tracks the "who-t rusts- whom" 
relationship [36]. It has 75879 vertices and 811480 
edges. The SKG parameters for this graph from [3] are: 
T = [0.4668 0.2486; 0.2243 0.0603], i = 17. 

• ca-HepTh: This is a co-authorship network from 
high energy physics [36]. It has 9875 vertices and 51946 
edges. The SKG parameters for this graph from [3] are: 
T = [0.469455 0.127350; 0.127350 0.275846], i = 14. 

• cit-HepPh: This is a citation network from high 
energy physics [36]. It has 34546 vertices and 841754 
edges. The SKG parameters from [3] are: T = 
[0.429559 0.189715; 0.153414 0.227312], t= 15. 

The comparisons between the properties are given, 
respectively, in Fig. 7, Fig. 8, and Fig. 9. In all of these, 
we see that CL (as expected) gives good fits to the 
degree distributions. For soc-Epinions, we see in Fig. 7a 
that the oscillations of the SKG degree distribution 
and how NSKG smoothens it out. Observe that the 
clustering coefficients of all the models are completely 
off. Indeed, for low degree vertices, the values are off by 
orders of magnitude. Clearly, no model is capturing the 
abundance of triangles in these graphs. The eigenvalues 
of the model graphs are also distant from the real 
graph, but CL performs no worse than SKG (or NSKG). 
Core decompositions for soc-Epinions (Fig. 7d) show 
that CL fits rather well. For ca-HepPh (Fig. 8d) CL 
is marginally better than SKG, whereas for cit-HepTh 
(Fig. 9d), NSKG seems be a better match. 

All in all, there is no conclusive evidence that SKG 
or NSKG model these graphs significantly better than 
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Figure 7: The figure compares the fits of various models for the social network soc-Epinions. 



CL. We feel that the comparable performance of CL 
shows that it should be used as a control model to 
compare against. 

7 Conclusions 

Understanding existing graph models is a very impor- 
tant part of graph analysis. We need to clearly see the 
benefits and shortcomings of existing models, so that 
we can use them more effectively. For these purposes, 
it is good to have a simple "baseline" model to compare 
against. We feel that the CL model is quite suited for 
this because of its efficiency, simplicity, and similarity 
to SKG. Especially for benchmarking purposes, it is a 
good candidate for generating simple test graphs. One 
should not think of this as representing real data, but 



as an easy way of creating reasonable looking graphs. 
Comparisons with the CL model can give more insight 
into current models. The similarities and differences 
may help identify how current graph models differ from 
each other. 
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